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ABSTRACT 

We applied the VESPA algorithm to the Sloan Digital Sky Survey final data release of the Main 
Galaxies and Luminous Red Galaxies samples. The result is a catalogue of stellar masses, detailed 
star formation and metallicity histories and dust content of nearly 800,000 galaxies. We make the 
catalogue public via a T-SQL database, which is described in detail in this paper. We present the 
results using a range of stellar population and dust models, and will continue to update the catalogue 
as new and improved models are made public. The data and documentation are currently online, 
and can be found at http://www-wfau.roc.ac.uk/vespa/. We also present a brief exploration of the 
catalogue, and show that the quantities derived are robust: luminous red galaxies can be described 
by one to three populations, whereas a main galaxy sample galaxy needs on average two to five; red 
galaxies are older and less dusty; the dust values we recover are well correlated with measured Balmer 
decrements and star formation rates are also in agreement with previous measurements. 
Subject headings: catalogues - galaxies: formation - galaxies: evolution - galaxies: stellar content - meth- 
ods:data analysis - surveys 



1. INTRODUCTION 

The stellar mass of a galaxy has been shown to cor- 
relate with properties such as luminosity, morphology, 
star formation rate, mass density and stellar a ge, to 
name only a few (e.g. [Brinchmann and Ellis! 



Bell et al.l 



l y a tew (e.g. larinciimann ana hilligi 
] |2003t iHeavens et a.1.1 120041: iBorch et all 



2OO0; 



2006 



Sheth et all 1200^ iBell et all l2007t IZheng et all 12007 : 
Panter et al T l2007h . Knowing how these relations evolve 
with redshift has been the goal of many observational 
studies, in an attempt to understand the main physical 
processes that drive star formation in galaxies. They 
can also provide strong constraints for models - these 
are normally "tuned" for the local Universe, and seeing 
how well they predict the evolution of these quantities 
with redshift is a very powerful test. 

Even though the stellar content of a galaxy is only 
the small tip of the iceberg, it remains a very important 
component of the Universe. Firstly because we can see 
it, and secondly because it holds an imprint of that 
galaxy's star formation history, which combined with 
other galaxies' provides information of when, how and 
where luminous mass formed in the Universe. 

Galaxies' integrated colors alone can provide insight 
about their evolution. The known bimodality of blue 
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and red galaxies in a variety of observables seems to 
tell us that these two populations are intrinsically 
different. Whereas this is useful in its own right, there 
is a considerable amo unt of more in f ormat ion to extract 
from galactic light. In lToieir"o eTall (I2OOI we addressed 
the problem of extracting information from a galaxy's 
integrated spectrum in a reliable way, and presented 
VESPA as a tool to do so. In the current paper, we 
present the catalogue resulting from applying VESPA 
to the Sloan Digit al Sky Survey's (jYork et all 120001 : 
IStrauss et al.l l2002f ) final data release, which is now 
public and ready accessible. 

First and foremost, this sort of analysis requires 
the means of physically interpreting galactic light. A 
galaxy's spectrum can be modelled as a superposition 
of stellar populations of different ages and metallicities, 
if we know the expected flux of each stellar population. 
This is given by stellar population models. 

Single stellar population models (SSPs) have three 
main ingredients. First we need a description of 
the evolution of a star of given mass and metallicity 
in terms of observable parameters, such as effective 
temperature and luminosity (e.g. [Alongi e t al.l 119931: 
Bressan et al. '1993'; Fa gotto et al.l [T994at iGirardi et all 
1996; Marigo et al. 20oi This can be calculated (or 
at least approximated) analytically, to produce the 
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so-called isochrones: evolutionary lines for stars of 
constant metallicty in a color-magnitude diagram. 
Secondly we need to assume an initial mass function 
(IMF) , which gives the number of stars per un it stellar 
mass , formed from a s ingle cloud of gas (e.g. ISalpeteil 
[l955t iChabrieii [2001 iKrounal |2003). Different mass 
stars evolve with different time-scales, and we can use 
the IMF to populate different evolutionary stages of the 
color-magnitude diagram with the correct proportion 
of stars of any given mass. Finally we need spectral 
libraries, which for a combination of parameters such 
as luminosity or color index, assign a spectrum to a 
star. Spectral libraries can either be drawn from our 
local neighbor hood, by taking high quality spectra of 
nearby stars (jLe Borgne et al.' 2003 !), or they can be 
theoretically motivated (e.g. Coelho et al.ll2005f ). 

Stellar population models are limited in two main 
ways. Certain advanced stages of stellar evolution, such 
as the supergiant phase, or the asymptotic giant-branch 
phase, are hard to model and not always implemented. 
This leads to uncertainties in the construction of the SSP 
models, which are in this case worsened by the fact that 
these are bright stars which contribute significantly to 
the overall luminous output. If using empirical spectral 
libraries, stellar population models are also limited by 
any bias of the stars in the solar neighborhood. For 
example, the Milky Way is deficient in a-elements (O, 
Ne, Mg, Si, S, Ca, Ti), which are indicators of fast 
star formation. Nearby stars are biased towards low 
[a/Fe], which in turn bias the sample of high quality 
stellar spectra available for collection. In this case 
theoretical models might help, by expli citly calculating 
spectra for a variety of Jg/Fel models (ICervantes et al.l 
[2007t iCoelho et al.] IBoTt iLeeeTall 120091) . now, it 
should be understood that the metallicity implicit by 
the SSP models is [Fe/H], which is not degenerate with 
a-element abundances. 

The point to take home is that an analysis such as 
VESPA (and any others of the same type) is intrinsically 
model dependent, be it on the SSP modeling, IMF or 
dust modeling. The catalogue presented in this paper 
includes analyses done based on different combinations 
of models, and will continue to be updated as new 
models are released to the community, or as necessity 
demainds. It is not our immediate goal to distinguish 
which models better approximate the real Universe, but 
to provide the user with an opportunity to do so in their 
own studies. 



1.1. Extracting the information 

Extracting information from galactic spectra is a 
much more complex problem than that of extracting 
information from, for example, the cosmic microwave 
background's power spectrum. Firstly we must be clear 
about the parameters we want to extract from the 
data. We are faced with a non-trivial decision, since 
any parametrization we might choose will undoubtedly 
be an over-simplification of the problem - a galaxy is 
almost infinitely more complex than the early Universe. 
However, the quality of the data will often impose a 
limit on how many parameters we can safely recover 



from the data and one must be careful not to ask for 
more than what the data allows. The risk is getting 
back a solution which i s largely dominate d by noise, 
rather than real physics (jOcvirk et al.ll2006[ ). 

From emission to absorption lines, continuum shape 
and spectral large-scale features, a galaxy's spectrum 
is packed with information about the physics of that 
galaxy. Stellar population and dust models provide 
us with a theoretical framework for their interpreta- 
tion, and there are various ways in which one can do this. 

Certain isolated spectral features are known to be 
well correlated with physical parameters, such as mass, 
star formation rate, mean age , or metallicity of a 
galaxy (e.g. iKauffmann et al.l l2003l: iTremonti et al.l 
[200l iGallazzi et all 120051 : iBarber et al.l l20'07ll Ab^ 
sorption features are directly related to the chemical 
abundances of a stellar population, as they are created 
when the black-body emission from the centre of the 
star passes through its cooler outer regions. Certain 
absorption features, such as the Lick indices, have 
been well measured and calibrated so as to provide a 
standard set of tools which aid in assig ning a physical 
meaning to a given absorption line (e.g. IWorthevlll994l : 
Thomas et al. 2003). Emission lines are a sign of recent 
star formation: young, massive stars are the only ones 
with enough UV emission to ionize their surroundings. 
The recombination of the ionized gas creates signature 
emission lines, such as Hq, and H^, whose intensity 
(in the absence of dust) can tell us about the abun- 
dance of young stars in a galaxy. UV emission is, in 
itself, also a good probe for star forination for exactly 
the same reasons (e g iMadiiu et al.l Il99l iKennicuttI 
19981: iHo pkins et al.' '2000'; 'Brinchmann et al.l 1200 ' 



Bundv et al . 2006; Erb et al. 2006; Abrah am et a 



Noeske et al.ll2007l : ISahm et al.ll2007l : iVerma et al 



200. 
I2007D 



VESPA focuses on using all of the available absorp- 
tion features, as well as the shape of the continuum, 
in order to interpret a galaxy in terms of its star 
formation history. Emission lines are not included in 
the stellar population models (and are not present 
in every galaxy) and so we do not concentrate on 
these. Other methods have b een developed toac- 
complish t he same task: e.g., iHeayens et al.l (|2000[ ) 
(MOPEDVlCid Fernandes IiE^ m Mj (STARLIGHT) 



lOcvirk et al.l (l2f)f)^ (STE CMAP), iMacArthur eTall 
( 20091 iKoleva et al.l ((20091) (ULySS). These and other 
methods acknowledge the same limitation - noise in 
the data and in the models introduces degeneracies 
into the problem which can lead to unphysical results. 
Our approach with VESPA is to adapt the number 
of parameters needed to parametrize a galaxy to each 
galaxy, taking into account the quality of the data. 
iToieiro et al.l (|2007f ) showed how, by using the integrated 
spectrum of a galaxy, an appropriate parametrization 
can be found which recovers the maximum amount of 
information from a galaxy without running into the risk 
of over-parametrizing. 

The result is a catalogue of robust and detailed star 
formation and metallicity histories, dust content and 
stellar mass for over 800,000 galaxies in SDSS's seventh 
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data release, which we are now making public. 

This paper is organized as follo ws: in Section [2] we 
briefly summarize our method fsee iToieiro et all ()2007f ) 
for full details) and the models we use; in Section [3] we 
summarize the data and pre-processing procedures; in 
Section we describe in detail each of the physical quan- 
tities output by VESPA; in Section[5]we lay out the tech- 
nical details of the database and tables, and provide the 
user with some example queries for ease of access; in Sec- 
tion [6] we explore some properties of the catalogue and 
finally we present conclusions in Section [7l Where nec- 
essar y, we assume a WMAP5 cosmology (jKomatsu et alJ 
[20091) . 

2. METHOD 

We use VESPA to analyze the seventh data release 
of the SDSS spectroscopic samp le. The full detail s 
of our method can be found in iToieiro et all (j2007D . 
For the sake of clarity and completeness however, 
we include here a summary that focuses on the issues 
that impact directly on how one might use the catalogue. 

In short, VESPA solves the following problem: 



Fa=/ fdust{Tx,t)i^{t)Sx{t,Z)dt (1) 

Jo 

where F\ is the observed rest-frame of a galaxy, ^(t) is 
the star formation rate (solar masses formed per unit of 
time) and S\{t, Z) is the luminosity per unit wavelength 
of a single stellar population of age t and metallicity Z , 
per unit mass. The dependency of the metallicity on age 
is unconstrained, turning this into a non-linear problem. 

The problem consists in recovering the star formation 
history of the galaxy, and its dust content, by making as- 
sumptions about the form of fdust and Sx{t, Z). VESPA 
accomplishes this by writing the problem in a linear form, 
and minimizing 



X 



'^recovered 



(2) 



where j represents a given bin in wavelength. Even 
though the problem has an analytical solution, a dataset 
perturbed by noise or which is otherwise deteriorated 
leads to instabilities in the matrix inversion and the 
recovered solutions can be entirely dominated by noise 
(jOcvirk et al.l l2006( l. VESPA has a self-regularization 
mechanism which estimates how many independent 
parameters one should recover from a given a dataset 
that has been perturbed. The result is a parametrization 
which varies from galaxy to galaxy, depending on its 
signal-to-noise ratio (SNR) and wavelength coverage. 

The overall goal of VESPA is to retrieve a solution 
which is robust, rather than very detailed in look-back 
time - we sacrifice precision for the sake of accuracy. 



2.1. VESPA'shms 

VESPA recovers stellar mass fractions for each age bin 
a, which span a range of look-back time Ai^. The age of 



the Universe is split into different-sized bins, as detailed 
in Figure [TJ In this figure we show a schematic view of 
the VESPA bins, together with the unique bin identifier 
number used in the published catalogue. The output 
for each galaxy generally consists of a combination 
of high-resolution bins populated with non-zero star 
formation, low resolution populated bins, and empty 
bins. Figure [2] show the results for two galaxies with 
very different bin configurations. 

Next we describe how we compute the SSP model flux 
of each of these bins, given a set of models S'(A, i, Z). 

2.1.1. High-resolution age bins 

At our highest resolution (HR) we work with 16 age 
bins, equally spaced in a logarithmic time scale between 
0.002 Gyr and the age of the Universe. In each bin, we 
assume a constant star formation rate 



I S{\,t,Z)dt 



with 



^ = 1/Ai„. 
2.1.2. Low-resolution age bins 



(3) 



(4) 



We work on a grid of different resolution time bins and 
we construct the low resolution bins (LR) using the high 
resolution bins described in Section 12.1.11 We do not 
assume a constant star formation rate in this case, as in 
wider bins the light from the younger components would 
largely dominate over the contribution from the older 
ones. Instead, we use a decaying star formation history, 
such that the light contributions from all the components 
are comparable. We start by writing 

/„^«(A,Z)= / mS{X,t,Z)dt, (5) 



which we approximate to 



EaG/3V'aAt„ 



(6) 



where low resolution bin /? incorporates the high resolu- 
tion bins a £ /3, and we set 



V'aAia 



J,f^R{X,Z)dX- 



(7) 



2.2. Models 



VESPA can work with any set of SSP, IMF and dust 
models, and the solutions it recovers are inevitably 
model-dependent. 



2.2.1. SSP Modeling 

The modeling of SSPs is still very much an active 
and developing field. We chose to publish the catalogue 
using more than one set of SSP models to give the 
user the opportunity to check how their results fare 
with different models. At the time of this writing, 
we are publishing resu l ts ob tained with t he models 
of iBruzual and CharlotI (|2003f ) (BC03) and iMarastonI 
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Fig. 1. — Schematic view of the grid of bins used by VESPA. The top hne of black numbers indicates the age of each boundary, in Gyrs. 
The red numbers in each of the bins is an unique bin identifier number, which can be used to quickly retrieve properties of a given bin. 



dlOOS") (M05). As different sets of models become 
available to the community, or as observational data 
demands, we intend to update the catalogue accordingly. 

With the BC03 models we ado pt a Chabrier ini- 
tial mass function (IChabriei) I2003D and Padova 1994 
evolu t ionary tracks ( Alongi et al.ll993t iBressan et al.l 



jrigi ( . 

[1991 iFagotto et all Il994al lbl: iGirardi et all 119960 . We 

interpolate metallicities using tabulated values at: Z = 
0.0004, 0.004, 0.008, 0.02 and 0.05. We use the red 
horizontal branch models of jMaraston (2005) with a 
Kroupa initial mass function (fkroupa .20071 ) . Models 
are suppHed at metaUicities oi Z = 0.0004, 0.01, 0.02 
and 0.04. Both models are normalized to IM© at i = 0. 



2.2.2. Dust Modeling 

The simplest approach to dust modeling is to assume 
that stars of all ages are affected in the same way: 



Fx = fdust{Tx) mSx{t,Z)dt. (8) 
Jo 

where t\ is the optical depth as a function of wave- 
length, and which we model as 

A comparison of this curve with the estimated extinction 



curve of the LMC bv lGordon et~aI1 (pOOl) can be seen in 
Figure ini In this figure both curves are normalised such 



that T 



5550A 

There is a variety of choices f or the form of .fdust(T\) . 
We use the mixed slab model of lCharlot and Fall ( 20001 ) 
for low optical depths (ry < 1), for which 

fdust{Tx) = i^[1 + {tx- 1) exp(-TA) - tIEi{tx)] (10) 

where Ei is the exponential integral and tx is the optical 
depth of the slab. This model is known to be less accurate 
for high dust values, and for optical depths greater than 
one we take a uniform screening model with 

/d«st(TA) = exp(-TA). (11) 

We call this model our one-parameter dust model. 
We also apply a t wo-pa rameter dust model by following 
iCharlot andFall (|20OT ) and set 



fdust{Tx,t) — 



fdustiTi^^')fdust{r^''),t<tBC 
fdust{T{SM)^t>tBC 



(12) 



J dust \ I X 

where t^^^^ refers to the inter-stellar medium and 
T^'-^ to the birth cloud. This allows stellar populations 
yonger than tsc to have extra extinction in relation 
to the older populations. We only use the uniform 
screening model to model the dust in the birth cloud 
and we use tx — ry (A/5500 A) "'^■^ as our extinction 
curve for both environments. We set tsc = 0.03 Gyr. 
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Fig. 2. — Two SDSS galaxies analysed with VESPA. In each case the top panels show the observed and fit spectrum (black and red, 
respectively; only the fitted regions are shown), the second panel the residuals, the third panel the recovered star formation mass fractions 
and in the bottom panel we show the recovered metallicity in each age bin. The example on the right shows a galaxy from which little 
information could safely be recovered which is translated into large age bins. The interpretation should be that the majority of this 
galaxy's mass was formed 11-14 Gyrs ago in the rest-frame, but we cannot tell more precisely when, within that interval, this happened. 
The example on the left shows a galaxy with a history which is better resolved. 
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Fig. 3. — Two commonly used dust extinction curves. The solid 
line shows a simple model that follows A"''-^ and is used throughout 
this paper. The dashed line shows the the extincti on curve esti- 
mated directly from the Large Magellanic Cloud by [Gordon et all 
l|2003l ) . Curves have been normalised to unity at A = 5550^4. 

As described, dust is a non-linear problem. In practice, 
we solve the linear problem described by equation ([1]) 
with a number of dust extinctions applied to the models 
S{t, Z) and choose the values of Ty^^'^ and Ty'~^ which 
result in the best fit to the data. We initially use a 



binary chop search for t^P^'^ e [0, 4] and keep Ty'-^ fixed 
and equal to zero, which results in trying out typically 
around nine values of Ty^^ . If this initial solution 
reveals star formation at a time less than Ibc "we repeat 
our search on a two-dimensional grid, and fit for Ty^^'^ 
and Ty simultaneously. There is no penalty except in 
CPU time to apply the two-para meter search, but we 
find that this procedure is robust (|Toieiro eTalllMl . 



2.3. Mock galaxies and wavelength range 

Any full spectral fitting code is depe ndent on the 
wavele ngth range which is fitted and in iToieiro et al.l 
12003) we showed that an increased coverage allows for 
a more precise and more resolved solution. Here we 
take the opportunity to take this analysis further by 
explicitly examining how the wavelength shift due to 
redshift affects the recovered solutions. 

We take a similar approach to iTojeiro et al.l ()2007l ) 

and simulate mock galaxies with two types of star 
formation histories: an exponentially decaying star 
formation rate {SFR oc jta and 7 = 0.3 Gyr ~^), and a 
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dual-burst history. We add Gaussian white noise for a 
SNR per 3A pixel of 50, and we take a fixed Earth-frame 
wavelength coverage of [3800, 9200] A, and excluding 
emission-line regions, as detailed in Section 13.21 In all 
cases the metallicity is assigned randomly for each age. 

We assess the quality of the recovered solutions in 
terms of the star formation history, metallicity history, 
and total stellar mass. Here we depart from our previous 
methodology and calculate goodness-of-fit estimates, in 
solution space, taking into account the recovered errors 
(see Section 14.21 for details on how these are calculated) . 
To avoid making the database too large we only keep the 
diagonal components of the covariance matrices, so we 
approximate by: 

where and are the input and recovered mass 
in bin a, and C(x) is the covariance matrix for the 
mass (see Section 221) • We compute Xz identical 
fashion. To assess how well we recover stellar mass we 
simply calculate (M^ — M^)/aM, where f77\/ is given by 
the mass covariance matrix, and and are the 
input and recovered total stellar mass, respectively. 

We expect Xsfh Xz be distributed like a 
distribution with n degrees of freedom. The complica- 
tion, however, is that n varies from galaxy to galaxy, as 
the number of bins changes. Given that we are dealing 
with uniform samples of galaxies, we do not expect this 
variation to be very large but it makes more sense to 
compute a reduced value of Xsfh ^^'^ Xz order to 
make the distributions more uniform . We compare the 
obtained distributions with the expected distribution 
with 5 degrees of freedom, as this is the average number 
of populations recovered. 

Figures [4] and [6] show these distributions for a set of 50 
mock galaxies at redshifts of 0.05 and 0.2, with an expo- 
nentially decaying and a dual-burst SFH respectively. 

In the exponentially decaying case, we find that recov- 
ered star formation history is perfectly consistent with 
the expected distribution, meaning that the recovered 
solutions and their errors are well estimated. In the 
case of metallicity, however, the recovered distributions 
seem incompatible. This is dominated by the fact that 
in most cases, the stellar mass formed in each bin is 
very low (with the exception of the two oldest), which 
makes it very hard to accurately recover the metallicity. 
If we restrict our analysis to the two oldest bins, the 
resulting x'z much more pleasing as we show in 
Figure [5] - here the expected distribution is computed 
for n = 2. Figure [5] suggests that perhaps the error on 
those two bins are over-estimated, but the recovered 
solutions are excellent. The middle panel in Figure 
|4] is a stark reminder that recovered metallicities for 
populations which contribute very little to the light of 
th e galaxy are p oorly constrained, as we pointed out 
in iToieiro et al.l (|2007f ). The total stellar mass is well 
recovered, although we see signs that our total error is 
an over-estimated by roughly a factor of two. The error 



is determined from the scatter of the recovered solutions 
of a set of mock galaxies, each generated with a SFH 
identical to that recovered, but with different noise. The 
SFH is not the true SFH of the galaxy (since this is not 
available in the real world), and this leads to an error 
estimate which is somewhat overestimated. In practice, 
however, this is not significant in real galaxies, where 
other factors are more important; namely the modelling 
a nd spectro-pho t omet ric calibrations (see e.g. Figure 14 
of lToieiro et al] (|2007[ )). 

Finally, we note that there is no significant difference 
in the recovered solutions as a function of redshift. 

In case of a dual-burst history, as shown in Figure 
[SI we see that in general the recovered star formation 
history is not as well recovered, given the errors. There 
is a wider variety in the n from galaxy to galaxy, so 
the differences for a low- value of reduced xsfh and 
Xz are not worrying, but the excess of galaxies with 
poor values of goodness-of-fit is. Some of these objects 
also have an unlikely poor goodness-of-fit in data space, 
which is always an indication that the solutions are 
poor, but the majority does not. The latter, however, 
have very wide bins. Take the example in Figure [7l 
The recovered mass fractions (plotted) are accurate, 
but the recovered absolute mass is not. This is because 
there is an implied star formation rate within each bin 
that ultimately determines the mass-to-light ratio of 
that bin, which in turn gives the the absolute mass 
recovered. The wider the bin, the more important 
this assumption becomes. Therefore, one should be 
careful when interpreting histories which are very poorly 
resolved. The metallicities tend to be better recovered 
in this case because there are fewer age bins with very 
low star formation. The scatter in total stellar mass is 
larger, but still good. 

Once again we fail to see any systematic offset due to 
the different redshifts of the mock galaxies. 

2.3.1. Emission-line regions 

A different question concerns the effect that removing 
emission-line regions has on the recovered solutions. 
As will be detailed in Section 13. 2[ we remove certain 
parts of the rest-frame wavelength range which corre- 
spond to common emission lines. We do this for all 
galaxies, irrespective of color, morphology or detection 
of emission lines by the SDSS pipeline, and have also 
removed them from our mock analysis in the previous 
section. However, some of the absorption features in 
these regions can be important for the recovery of the 
solutions. 

As a test, we repeat the analysis done above, but this 
time using the full spectral range between [3800, 9200] A 
in the Earth-frame. We use the same galaxies as above, 
in the case of an exponentially decaying SFH, and at a 
redshift of 0.05. The results can be seen in Figure [51 
The red line in this Figure should be compared to the 
red solid line of Figure [H- the two distributions are very 
similar, indicating that removing these regions from the 
wavelength coverage does not affect the results in any 
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Fig. 4. — ~ M^)/aM for 50 mock galaxies with an exponentially decaying star formation history, at a redshift of 

0.05 (red solid line) and 0.2 (red dashed line). For reference, we also show the expected distributions in the same binning scheme (black 
solid line). See main text for discussion. 
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Fig. 5. — The distribution of Xz ^^'^ '■^^ oldest bins only, 
in mock galaxies with an exponentially decaying star formation 
history, at a redshift of 0.05 (red solid line) and 0.2 (red dashed 
line). For reference, we also show the expected distribution in the 
same binning scheme (black solid line). The contrast between this 
distribution and that shown in the middle panel of Figure |4] shows 
how metallicities of populations which have a low contribution in 
mass/light is highly unconstrained. 



significant way. 

3. DATA 

We analysed the final data releases of the Main Galaxy 
Sample (MGS) and the Luminous Red Galaxies Sample 
(LRGS). 

The main galaxy sample (jStrauss et al.l l2002f ) is a 
magnitude-limited, high-completeness (> 99%) galaxy 
sample, selected in the r-band. In its final data release it 
consists of over 700,000 galaxies, with a median redshift 
of approximately 0.11. Galaxies are selected according 
to three criteria: a star-galaxy separation test, a cut in 
the r'-band petrosian magnitude (from here on we will 
refer to SDSS's r'- as r-band, and similarly to the other 
four bands), and a cut in surface brightness. To avoid 
very poor quality data we impose a further limit on the 
surface brightness < 23. 



The LRGs sample is selected based on g — r and r — i 
col ors and r-band a nd z-b and magnitudes, as detailed 
in lEisenstein et al.l ()2001l ). The cuts were designed 
to follow an LRG fiducial model, and targets galaxies 
which are intrinsically bright and red from a redshift of 
around 0.2 to 0.6. 

In both cases we use the bulk spectra samples provided 

in 

http : / /www. sdss.org /dr 7/ products/spectra /getspectr a. html 

3.1. Quality of the fits 

As in iTojeiro et al.l (|2007| ). we find that the values of 
reduced still fall short of what is formally a good 
fit. The reason remains the limitations of the models in 
describing all the features of real spectra. This is visible 
in the plotted residuals of Figure [H defined here as the 
difference between the input and the recovered spectrum, 
in units of the noise per wavelength bin. The left over 
structure on these residuals can in fact tell us something 
about what these limitations are, and with careful 
analysis tell us about which sets o f models give abet- 
ter prescription of real galaxies - see lPanter et al.l (|2007| ). 

The data reduction procedure can also impact on the 
quality of the fits. We find that the new spectro- 
photometric calibration pipeline, introduced in DR6, im- 
pacts visibly and positively on the quality of the fits we 
recover (see Section [6?2|) . but still does not allow values 
of reduced of unity. For further discussion, refer to 
iToieiro et ail (|2007[) . 

3.2. Handling SDSS data 
3.2.1. Galactic extinction 

Spectra in DR7 do not include a correction for dust 
extinction due t o our own galaxy. W e use the Galac- 
tic dust maps bv'Schle gel et al.l (|1998f ) to obtain a value 
of E(B-V) for each spectroscopic plate. We estimate of 
the un-obscured flux using the dust extinction curve of 
lO'DonneU (fl99l . which assumes a uniform dust screen. 

3.2.2. Pre-processing 
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Fig. 6. — ^'^'^ (M^ — M^)/(TM for 50 mock galaxies with a dual-burst history, at a redshift of 0.05 (red solid line) and 0.2 

(red dashed line). For reference, we also show the expected distributions in the same binning scheme (black solid line). See main text for 
discussion. 
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Fig. 7. — An example of a mock galaxy where the Xsfh 
unlikely poor, in spite of the good fit in data space. In this case, 
this is due to the wide bin and the assumed star formation rate - 
see text for details. 



Prior to any analysis, we processed the SDSS spec- 
troscopic data, so as to accomplish the desired spectral 
resolution and mask out any unwanted signal. 

The SDSS data files supply a mask vector, which flags 
any potential problems with the measured signal on a 
pixel-by-pixel basis. We use this mask to remove any 
unwanted regions and emission lines. In practical terms. 



we ignore any pixel for which the provided mask value 
is not zero. 

The BC03 synthetic models produce outputs at a resolu- 
tion of 3A, which we convolve with a Gaussian velocity 
dispersion curve with a stellar velocity ay = 170kms~^, 
this b ei ng a typical valu e fo r SDSS galaxies (jPanter et al.l 
l2007t) . IPanter et all (|2()07f ) have also shown that there 
is no significant effect on the recovered star formation 
and mctallicity histories as a result of the adoption of 
a single value of velocity dispersion. At the expense 
of CPU time, one can lift this limitation, but the data 
is currently too poor to justify this. One may worry 
about a me tallicity - vel o city d ispersion degeneracy, as 
reported in iKoleva et al.l ()2008D . This is at the moment 
of no concern, given the quality of the data and the 
dependence of the recovered metallicity values on the 
SSP model chosen (see Section . which is dominant. 
The M05 models are given at a resolution of 20A, and 
we do not apply any further dispersion in this case. 
We take the models' tabulated wavelength values as a 
fixed grid and re-bin the SDSS data into this grid, using 
an inverse-variance weighted average. We compute the 
new error vector accordingly. Note that the number 
of SDSS data points averaged into any new bin is not 
constant, and that the re-binning process is done after 
we have masked out any unwanted pixels. Additionally 
to the lines yielded by the mask vector, we mask out 
the following emission line regions in every spectrum's 
rest-frame wavelength range: [5885-5900, 6702-6732, 
6716-6746, 6548-6578, 6535-6565, 6569-6599, 4944-4974, 
4992-5022, 4846-4876, 4325-4355, 4087-4117, 3711-3741, 
7800-11000] A. These regions were determi ned by visual 
inspe ction of over 1000 galaxy fit residuals (jPanter et alJ 
I2OOI . 

These re-binned data- and noise-vectors are essentially 
the ones we use in our analysis. However, since the 
linear algebra assumes white-noise, we pre-whiten the 
data and construct a new fiux vector — Fj / aj , which 
has unit variance, a'j = l,Vj, and a new model matrix 
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Fig. 8. — (A/^ — M^)/(TM for 50 mock galaxies with an exponentially decaying star formation history, at a redshift of 

0.05 (red solid line). For reference, we also show the expected distributions in the same binning scheme (black solid line). The analysis is 
the same as seen in the red solid line of Figure |4] but with an extended wavelength coverage. See main text for discussion. 



The wavelength vector is shifted to the galaxy's rest 
frame. 

4. THE CATALOGUE 

The catalogue is published as a query-based T-SQL 
database. The data is organised in tables, which we de- 
scribe in section [SJ The principal physical properties 
provided by VESPA are summarised in Table [T] . 



Symbol 


Units 


Description 


Xa 




star formation fractions in bin a 


Ua 


A/0 


stellar mass formed in bin cx 


ma 


A/0 


recycled stellar mass in bin a 




M'i 


covariance matrix for the star forma- 
tion fractions 






mass-weighted metallicity for bin a 


'-'all 




covariance matrix for the metallicities 




Mq 


recycled stellar mass in galaxy 




A/0 


total stellar mass formed in galaxy 






dust extinction due to the inter-stellar 
medium for all populations 






dust extinction due to the birth cloud 
for young populations 



TABLE 1 

Galaxy properties which are derived by VESPA. Where 

APPROPRIATE, QUANTITIES ARE CORRECTED FOR FIBER APERTURE. 



(15) 



where Fj has been shifted to the rest frame of the 
source, and Xa is the star formation fraction formed in 

The luminosity of a galaxy, written in terms of our 
models, is simply Lj — UaSaj, where Ua is the stellar 
mass formed at age a. From (|14p : 



'-AnDlil + zY 



(16) 



which for any given age bin a gives the mass formed 
in each bin a in units of solar masses: 



Xa^TTD\{l + z). 



(17) 



We distinguish between the stellar mass ever formed in a 
galaxy, and the stellar mass remaining in a galaxy today: 



M{t) 



%l}{t') [1 - R{t - t')] dt' 



(18) 



where R{t — t') is the fraction of stellar mass lost to the 
ISM at time t, by a stellar population aged t' and i^it') 
is the star formation rate at age t' . In practical terms we 
calculate the following: 



In the next section we describe the meaning and cal- 
culation of each of these quantities. 

4.1. Masses and mass fractions 

VESPA recovered star formation fractions, which we 
transform into absolute masses as follows. 



fiber 



(19) 
(20) 

(21) 



If A is in the observed frame, then we can relate flux 
of a galaxy to the emitted luminosity by 

F, = ^ ^^^/(/+-) ^ (14) 



4.^01(1 + z) 

In practical terms we do the following. We can rewrite 
equation ([l]) in a simplified way: 



where Ra , known as the recycling fraction is given by 
the models, for each of the metallicities. Ra is typically 
of the order of 0.5 for the older populations, whereas in 
the younger bins the mass loss is much less significant 
with Ra between 0.7 and 0.9, depending on the width 
of the bin. 
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Finally, we correct for the fact that the fiber has an 
aperture of 3 arcseconds, which means we do not typ- 
ically observe the entirety of the galaxy. We use the 
observed fiber and petrosian magnitudes in the z-band 
to scale up the stellar mass as 

M» = M^J.ber X 100-4(^P-/2P) (22) 

where Zp and fzp are the petrosian and fiber magni- 
tudes in the z-band respectively. This scaling assumes 
that the parts of the galaxy which do not fall under the 
fiber's aperture have an identical star formation history 
as that observed. For an ensemble of galaxies fiber aper- 
ture corrections are not important, in the sense that the 
mean color from the f iber is the same as the mean color 
from the photometry (jGlazebrook et a^l2003^ . However, 
one should keep in mind that they remain important for 
individual galaxies, or at very low redshift. 

4.2. Error estimates 

To estimate how much noise affects our recovered so- 
lutions we take a rather empirical approach. For each 
recovered solution we create rierror random noisy reali- 
sations and we apply VESPA to each of these spectra. 
In the current runs we have used rierror = 20. We re- 
bin each recovered solution in the parametrization of the 
solution we want to analyse and estimate the covariance 
matrices 

C{x)al3 ^ {{Xa " Xa){xi3 - Xp)) (23) 

C{Z)o,p^{{Z^-Z^){Zp-Z0)). (24) 

For convenience, we also define a covariance matrix of 
the unrecycled mass per bin as (using equation [T7| 

Co.p{u) = C^(i{x) [47ri?i(l -I- z)]' (25) 

from which we can estimate the error in the unsealed 
mass formed in bin a as (T„(q) ~ ^ Caaiu). This ignores 
the uncertainty in the estimation of the redshift, which 
is of little significance compared to the variance of Ua 
across realisations. From a^^a) can calculate errors 
in crm(Q,) by multiplying by the corresponding recycling 
fraction i?^. The error on the metallicity is simply 

V Caa{Z). 

We use the full covariance matrix to estimate the sta- 
tistical error on the total stellar mass, M*: 

a^M,) = a^AUj,,,rh^ + </.6e.'^'(7) (26) 

where 7 is the conversion factor between fiber and 
galaxy mass of equation ([22|) . and 17(7) is the error as- 
sociated with this factor, calculated using the errors in 
the petrosian and fiber z-band magnitudes. cr^(-M* jiber) 
is estimated from the unrecycled mass covariance matrix 
and the total recycling fraction of the galaxy, R: 

a (M* 

, fiber ) = ^a^(^^)-R'. (27) 

This assumes that there is no error in the recycling 
fractions - i.e., that we know the SFH exactly. 



This approach estimates errors due to photon noise 
only, but there is also a systematic error associated with 
the limitation of the models we use. The true effect of 
the models on the recovered solutions is impossible to 
calculate. By providing solutions using more than one 
set of models we give the user an opportunity to check 
how the answers of interest to them change by using 
different models. However, we caution against using 
this as a quantitative estimate of the systematic error 
associated with modeling, as models could be wrong or 
incomplete in the same way. 

As an illustration of the type of correlations one finds 
between the different age bin, in Figure [3] we show the 
correlation matrix for the galaxy in the left-hand panel 
of Figure [2] The correlation function is defined as 



and shows how independent two quantities are - in this 
case rric and m^. By construction r^^^ G [—1,1]. One 
can see that the highest cases of correlation happen in 
adjacent bins, which is not surprising as the spectral sig- 
nature of two adjacent bins is more similar than that of 
two bins further apart in lookback time. Non-adjacent 
bins however, are not completely uncorrelated, with the 
highest absolute of value of r being around 0.6. 




Fig. 9. — The correlation matrix of the recovered mass fractions 
for the galaxy in the left-hand side panel of Figure [2] 



4.3. Mass and metallicity per age bin 

The history of each individual galaxy is likely to 
be parametrized by a combination of high- and low- 
resolution age bins. One must be careful to interpret the 
masses and metallicities associated with low-resolution 
bins. We re-iterate that in these cases the mass recovered 
should be interpreted as the total mass recovered in the 
bin, but we have little information of when, within the 
bin, it formed. Similarly, we should interpret the metal- 
licity values recovered as a mass- weighted metallicity for 
the whole duration of the bin. This information can be 
accessed via the table BinProp, see Section[5]and Tabled 

In addition to the masses and metallicities obtained 
in their original resolution, we choose to also provide 
a fully-resolved star formation and metallicity history 
for each galaxy. In this case, the solutions are post- 
processed to be presented in 16 bins. We do this by 
using the weights of equation ^ to split the mass of 



A catalogue of star formation and metallicity histories 



11 



a low-resolution bin across the relevant high-resolution 
bins. We use the same weights for the error in the mass. 
The metallicity of the new high-resolution bins will be 
set to be the same as the old low-resolution bins, and 
the error remains the same. This information can be 
accessed via the table HRBinProp, see section [5] and 
Table m 

By construction, the solutions in HRBinProp are 
always consistent with the ones in binProp, but they 
should be used with caution. HRBinProp is a very 
useful table when one needs to have a uniform set of 
bins in order to study, for example, average rest-frame 
quantities as a function of time. Using this table for 
ensemble of galaxies is appropriate, but not on an 
individual galaxy basis. Care must also be taken with 
the treatment of mass errors across the high-resolution 
bins, as they will of course be highly correlated with 
those bins which formed the same low-resolution bin. 
When taking averages over large ensembles, an error 
on the mean of each individual bin might be more 
appropriate. 

4.4. Dust 

Depending on the dust model used we recover either 
one or two values of dust extinction, one associated 
with young stars {ry'" - applied to stellar populations 
younger than 0.03 Gyrs) and one associated with the 
whole galaxy {ry^^'^ - applied to all ste llar populations) , 
follow ing the mixed slab model of ICharlot and Falll 
(|2000l) . 



We have split the database into seven tables: 

• GalProp includes results relative to the galaxy as 
a whole, 

• BinProp has results which refer to each bin indi- 
vidually, in the original resolution as recovered by 
VESPA 

• HRBinProp splits the recovered solutions into the 
highest possible resolution (16 bins), using the ap- 
propriate weights which are given by equation ([7]). 

• DustProp holds dust information 

• BinID identifies each VESPA low- or high- 
resolution bin. 

• RunProp holds run-specific details, such as SSP 
and dust models used. 

• LookupTable links VESPA's unique identifier with 
SDSS's own identifiers, such as specObjID, plate, 
MJD and fiber ID. 

Tables El El HE] and El detail each of the fields included 
in the tables mentioned above. Galaxies are identified 
by a unique index, which can be associated with SDSS's 
specObjID via the table LookupTable. This means 
that object properties which are already included in 
the SDSS need not be included in this database, as 
cross-matching can be done in a straightforward way 
using specObjID. 



The two-parameter dust model gives a more realis- 
tic description of the data, but it is not clear whether 
the data jus t ifies the extra degree of freedom. In 
iTojeiro et all (|2007[ ) we saw that even if is slightly 
degenerate with the amount of stellar mass created in 
populations younger than tscj it helps to recover a 
more accurate value of Ty^^'^ . The solution with a one- 
parameter model is a necess ary step to a s olutio n with 
a two-parameter model (see iToieiro et al.l ()2007D ). and 
we see no reason to not make the two sets of solutions 
public, especially if the user is interested in populations 
younger than tsc- However, we do not run a full error- 
analysis on the one-parameter solution, and all the error 
columns are set to zero. 

5. THE DATABASE 

The catalogue is being published as a relational 
T-SQL database, which provides maximum flexibility 
and interoperability with other databases. It can be 
accessed through the WFCAM Science Archive (WSA) 
via http://www-wfau.roe.ac.uk/vespa/ and will be 
made available on Astrogrid in due course. Information 
about subsequent data releases using new models, code 
versions or data-sets will be given on this website, and 
any information online overrides information on this 
paper. 

The database is split into a number of tables, each 
with a number of fields (columns) , which can be accessed 
and queried via SQL. 



Field 


Units 


Description 


indoxP 




Unique identifier, constructed from 
SDSS's plate, MJD and fiberlD info. 


runID 




Gives detail of the run, in RunProp. 


m_stellar 


Mq 


A/» - equation i22[. 


m_stellar .error 


Mq 


cr{Mt) - equation II26I. 


tJb 


Gyr 


Lookback time of galaxy, assuming a 
WMAP5 cosmology. 


chi2 




X"^ of the unmasked regions used for 
spectral fit. 


SNR 




Signal to noise ratio of the used (un- 
masked) spectrum, at the models' res- 
olution. 


nbins 




Number of recovered bins in a galaxy. 


npops 




Number of recovered bins with non-zero 
mass. 



TABLE 2 
GalProp 



Table [8| lists the details of the VESPA runs which 
are currently complete, and are either on the database 
at the time of publications or will be in a very short 
period. Future extensions of the catalogue or code will 
increase this table accordingly. 



5.1. Example queries 

In this section we give some specific examples of how 
to explore the catalogue. This list is simply intended to 
demonstrate some of the potential of the catalogue, and 
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Field 


Units 


Description 


indexP 




Unique identifier, constructed from 
SDSS's plate, MJD and fiberlD info. 


runID 




Gives detail of tlie run, in RunProp. 


binID 




Bin identifier as given by Figure 1 1 1 


mass 


A/q 


Mass formed in bin - equation II17II - 
corrected for fiber aperture. 


mass_error 


Mq 


'^u(a) - derived from equation l|25)l - 
corrected for fiber aperture. 


SFR 




Star formation rate in bin. 


Z 




Metallicity in bin. 


Z_crror 




'^Z(a) derived from equation II24II. 



TABLE 3 
BinProp 



Field 


Units 


Description 


indexP 




Unique identifier, constructed from 
SDSS's plate, MJD and fiborlD info. 


runID 




Gives detail of the run, in RunProp. 


binID 




Bin identifier as given by Figure [Tl 


mass 


Mq 


Mass formed in bin - from equation II17II 
with weights from equation (0 and cor- 
rected for fiber aperture. 


mass_error 


Mq 


'^u(ci) ~ from equation H25|l with weights 
from equation jTjl and corrected for 
fiber aperture. 


Z 




Metallicity in bin. 


Z_crror 




a-z(a) derived from equation II24II. 



TABLE 4 
HRBinProp 



Field 


Units 


Description 


indexP 




Unique identifier, constructed from 
SDSS's plate, MJD and fiberlD info. 


runID 




Gives detail of the run, in RunProp. 


dustID 




Dust identifier: 1 for Ty'-' and 2 for 
^ISM 


dustVal 




Either Ty'^ or Ty^^' , according to 
dustlD. 


TABLE 5 
DustProp 


Field 


Units 


Description 


binID 




Bin identifier, as given by Figure [T] 


ageStart 


Gyr 


Age of the young boundary of the bin. 


agcEnd 


Gyr 


Age of the old boundary of the bin. 


width 


Gyr 


Width of the bin in Gyrs. 


width_bin 




Width of the bin in units of high- 
resolution bins. 



TABLE 6 
BinID 



give the user a starting point. 

A simple query to return the present-day steUar mass 
of all galaxies in the LRG sample, analysed with BC03 
and a 2-parameter dust model could look like 

SELECT specObjID, m_stellar, m_stellar_error 



Field 


Units 


Description 


runPropID 

SSP 
code Version 
dustModel 
dataRelease 
sample 




Unique run ID. 
SSP models used. 
VESPA code version. 
Dust model used. 
SDSS's data release. 
MGS or LRG. 



TABLE 7 

RunProp 



runID 


SSP 


code Version 


dustModel 


dataRelease 


sample 


1 


BC03 


1.0 


1 


DR7 


MGS 


2 


BC03 


1.0 


2 


DR7 


MGS 


3 


M05 


1.0 


1 


DR7 


MGS 


4 


M05 


1.0 


2 


DR7 


MGS 


5 


BC03 


1.0 


1 


DR7 


LRG 


6 


BC03 


1.0 


2 


DR7 


LRG 


7 


M05 


1.0 


1 


DR7 


LRG 


8 


M05 


1.0 


2 


DR7 


LRG 



TABLE 8 
RunProp in full 



FROM lookUpTable as 1 , galProp as g 
WHERE 

g . indexP = 1 . indexP 
AND g.runID = 6 

The rest-frame averaged star formation fractions as a 
function of lookback time, for the MGS galaxies, anal- 
ysed with M05 and a 2-parameter dust model is accessi- 
ble via the query 



SELECT binID, AVG(hr .mass/g .m_stellar) 
FROM hrBinProp as hr, galProp as g 
WHERE 

g . indexP = hr . indexP 
AND g.runID=4 
AND hr.runID=4 
GROUP BY hr.binID 
ORDER BY binID 

It is also easy to select galaxies according to their star 
formation histories. For example, the following query 
returns the stellar mass distribution of galaxies which 
form over than 50% of their present-day stellar mass in 
the last age bin: 

SELECT 

.5+FL00R(10.*L0G10(g.m_stellar))/10. as Igm, 

COUNT (*) as freq 

FROM GalProp as g, BinProp as b, 

binID as bID 

WHERE 

g . indexp=b . indexp 

AND bid.binid = b.binID 

AND bid.ageEnd = 14 

AND b.mass > (0.5 * g.m_stellar) 

AND g.m_stellar < lel4 

AND g.m_stellar > le7 

AND g.runID = 2 

AND b.runID = 2 
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GROUP BY .5+FLDDR(10.*L0G10(g.in_stellar))/10. 
ORDER BY Igm 

We can also access average star formation rates over 
any period of time, by targeting the correct bins. To 
return the average star formation rate over the last 115 
Myrs of the lifetime of the galaxy (corresponding to the 
first five high- resolution bins), one can do 

SELECT specObjID, SUM(lir.mass)/115. as SFR 
FROM 

lookUpTable as 1, hrBinProp as hr, 

binID as bID 

WHERE 

1 . index? = hr . indexP 
AND hr.binID = bID.binID 
AND hr.binlD < 5 
AND hr.runlD = 4 
GROUP BY specObjID 

To access the star formation history of any given galaxy 
the user can select, for example, on plate, fiberlD and 
MJD as: 

SELECT specObjID, bID . ageStart , bID.ageEnd, 
b.mass, b.mass_error , b.Z, b.Z_error 
FROM 

binID as bID, binProp as b, 

lookUpTable as 1 

WHERE 

1 . indexP = b . indexP 
AND b.binID = bID.binID 
AND 1. plate = 0353 
AND l.mjd = 51703 
AND l.fiberlD = 077 
AND runID = 2 

We re-iterate that when selection from binProp the 
number of rows returned will vary from one galaxy to the 
next. If a uniform number of bins is required, then use 
HRBinProp but this should be done with caution (see 
Section 14. 3|) . The user should also be careful to always 
make a selection on runID on all relevant tables to avoid 
duplicate results - each galaxy might be represented 
in the catalogue a number of times, analysed with a 
different combination of models. To run the same query 
with any different set of models, the user needs only to 
change runlD. 

Whereas these queries are focused on rest-frame 
quantities, we also provide the redshift and look-back 
time of each galaxy in the table GalProp, for easy access 
to Earth-frame quantities. 

It is also possible to directly tap into the SDSS DR7 
databases that are held at the WSA. For example, re- 
turn fiber-corrected stellar mass and the fiber-corrections 
which were applied to those masses one can do: 

SELECT 1. specObjID, g.m_stellar , 

sdss .petromag_z - sdss . f ibermag_z AS fiber_diff , 

power (10 . 0000 , . 4* (sdss . petromag_z - 

sdss . f ibermag_z) ) 

AS fiber_corr 

FROM bestDR7. .specPhotoAll as SDSS, 
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Fig. 10. — Number of populated mass bins for the two samples 
of galaxies currently in the catalogue: the MGS (in the solid line) 
and the LRG sample (in the dashed line). 

galprop as g, lookuptable as 1 
WHERE 

sdss . specObj ID = 1. specObjID 
AND 1 . indexp = g . indexp 
AND g.runid=2 

Details on the other databases at the WSA can be 
found at _http://surveys.roe.ac.uk/wsa/,. 

6. RESULTS 

The VESPA catalogue can be exploited in many and 
varied ways. In here, we take the opportunity to show 
some basic characteristics of the data. We explore aver- 
age rest-frame star-formation histories, stellar masses, 
dust content and modeling, star formation rates and 
show some comparisons with GALEX magnitudes. 

We will mention red and blue galaxies throughout, 
which refer to a simple cut in u — r color. We have 
assigned galaxies with u — r < 1.5 the index of blue and 
to those with u — r > 2.8 the index of red. 

6.1. Number of parameters 

First let us look at the typical number of non-zero mass 
bins recovered for each g alaxy, which i s show n in Fig- 
ure [lOl As first shown in iTojeiro et all (|2007t l. VESPA 
recovers typically between two to five populations from 
each galaxy in the MGS. This histogram is dramatically 
different for the LRG sample, with the majority of the 
galaxies being parametrized with three or less popula- 
tions. This is likely to be a combination of two factors: 
LRGs have simple star formation histories, dominated 
by old stars; and the spectral data for LRGs is of poorer 
quality, given the larger redshift range of this sample. 
This suggests that parametrizing the LRGs with a single 
burst at high-redshift may be an acceptable solution, es- 
pecially given the typical quality of the data. However, 
such a simplification is not justifiable for a MGS galaxy, 
and the SDSS data certainly allows for a much better 
description. 

6.2. Stellar masses in DR7 

SDSS's DR6 introduced a change in the spectro- 
scopic calibration scale of approximately 0.35 magni- 
tudes - spectra from DR6 and onwards are brighter 
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Fig. 11. — Stellar masses from a random sub-sample of galaxies 
analysed using their DR5 and DR7 spe ctra. The difference is due 
to a new flux scale introduced in DR6 llAdelman-McCarthv et al.l 
120081) . The constant offset which best fits the data is 0.26 dex, and 
is represented by the red line. 




Fig. 12. — of a. random sub-sample of galaxies analysed using 
their DR,5 and DR7 spectra. There is a visible improvement in 
DR7, in the majority of the cases. 
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Fig. 13. — Present-day stellar mass estimates, recovered using 
the BC03 and the M05 models, and a two-parameter dust model. 
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Fig. 14. — Mass-averaged metallicity for a random selection of 
20,000 galaxies analysed with the BC03 and M05 models. The 
solid red line shows the mean, and the dashed red line is the one- 
to-one line. For reference, the vertical black dashed line is at solar 
metallicity. 



(lAdelman-McCarthv et al.|[2008l ). The result is a nearly 
constant offset in the estimated masses between data re- 
leases before and after DR6, as can be seen in Figure fTTl 
Assuming a slope of unity, we find that the offset which 
brings the two estimates into agreement is 0.26 dex. We 
find that the difference in other derived quantities, such 
as star formation histories and dust extinction is negligi- 
ble. 

Figure [12] shows the difference in the values of 
the fits. We see that in the majority of cases the new 
spectro-photometric calibrations introduced with DR6 
have somewhat improved the goodness-of-fit of the 
VESPA solutions. 

The effect of the choice of SSP models on the estima- 
tion stellar mass of high-re dshift galaxies is known to 
be significant (Ma raston et al. 2006). However, we find 
that under the SDSS's redshift and rest-frame wave- 
length range, estimates of stellar mass from full-spectral 
fitting are robust against the choice of SSP modeling 
as can be seen in Figure [T31 BC03 masses are only 
very slightly systematically higher but within the la 
errors. The reduced of the one-to-one line, taking 
into account the errors on both measurements is 1.3. 
This indicates that there is a source of error associated 



with the choice of SSP which is not accounted for in 
the formal errors returned by VESPA. This is expected, 
and we brought the attention to this matter in Section 
14.21 However, there is no statistically significant offset. 
M05 models generally bring masses down due to the 
inclusion of the thermally pulsating asymptotic giant 
branch stars, which are very bright and generally require 
less stellar mass to explain a given luminosity (see also 
Section 16. 4p but these stars have the most effect at 
waveler igths redwards of the SDSS spectral range, as 
seen m iMaraston et~all (|2006[ ). 

6.3. Metallicity estimates 

For each galaxy, we can also compute the mass- 
averaged metallicity using the recovered star-formation 
history. In contrast with estimates for the total stellar 
mass, we find in Figure [T3] that the estimated metallic- 
ity of each individual galaxy depends heavily on the SSP 
modelling used. 

Some of this scatter is due to poorly constrained pop- 
ulations which may have some weight in terms of mass, 
but not in terms of light contribution (in, for example, 
galaxies with very young stellar populations). However, 
the overall trend is very clearly something which is fol- 
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Fig. 15. — The rest-frame averaged star formation history of a 
sub-sample of galaxies analysed using the BC03 models (solid line) 
and the M05 models (dashed line). The error bars are errors on 
the mean and we used a one-parameter dust model. 

lowed by the vast majority of galaxies. We cannot say at 
this stage which model is more correct, and we are left 
simply to emphasise that the mass-averaged metallicity 
of galaxies has to be interpreted within the limitations 
of each set of models. 

6.4. Average star-formation histories 

Figure [T5] shows the averaged rest-frame star forma- 
tion history of a sample of 20,000 galaxies in the MGS, 
obtained using the BC03 and the M05 models. We do 
not intend to assess which sets of models are better, but 
rather identify where different models give very different 
results. In this case, for example, the inclusion of the 
thermally pulsating asymptotic giant branch stars in 
the M05 models means that populations of around IGyr 
are intrinsically brighter. This decreases the amount of 
mass needed in populations of this age, and results in a 
smoother decaying star formation history. 

Figure [H] shows the same for a subsample of blue 
and red galaxies. Whilst what we call red galaxies form 
most of their stars over 9 Gyrs ago, blue galaxies have 
a significantly flatter average star formation history and 
much more recent star formation, as expected. 



6.5. Star formation rates 

We can have a measurement of the instantaneous star 
formation rate by averaging the recent mass formed in 
a galaxy. We compared the SFR averaged over the last 
llSMyr - corresponding to the first 5 bins - to those of 
iBrinchmann et aD (j2004f ). which relate to DR4. Prior 
to this comparison we corrected our results for the mass 
offset of 0.26 dex, as discussed in the previous section. 
The results for both fiber and total quantities can be 
seen in Figure [171 



6.6. Dust 

VESPA provides an estimation of the dust content of 
a galaxy. Figure [18] shows the recovered values of T^f^^ 
for a random subsample of approximately 10,000 galaxies 
analysed using the BC03 models, and a two-parameter 
model. We see a very bimodal distribution, which for red 
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Fig. 16. — The rest-frame averaged star formation history of 
a sub-sample of galaxies analysed using the BC03 models (solid 
lines) and the M05 models (dashed lines) for red and blue galaxies 
in red (lower) and blue (upper) respectively. The error bars are 
errors on the mean and we used a one-parameter dust model. 
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Fig. 17. — The star formation rate estimated by averaging the 
mass formed over the latest 115 Myrs of each galaxy, using a two- 
parameter dust model and BC03 models. W e compare these values 
to the SRF catalogue of iBrinchmann et al.l l[2j3 04) , and first correct 
our recovered masses by 0.26 dex (see Section l6.2l l. The top panel 
shows the results within the 3-arcsecond fiber aperture, and the 
bottom panel the results extrapolated to the whole of the galaxy 
using equation II22I I. 



galaxies can be decoupled almost cleanly with our chosen 
color cut. This is very much what one would expect to 
see, with the red galaxies being more likely to populate 
the area of the histogram corresponding to a low dust 
attenuation. 

The choice between a 1- or a 2-parameter dust model 
does not impact significantly on the recovered values of 
Ty^^ , nor on the recovered star formation histories for 
the old populations. This happens because the young 
populations, which are the only ones affected by Ty'^ 
are only significant, in terms of light, in the UV region 
of the spectrum. We exemplify this in Figure [THl which 
shows the recovered spectrum of nine random galaxies 
extended to the UV. 
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Fig. 18. — Distribution of Ty for a sample of galaxies, analysed 
with BC03 and a two-parameter dust model (black line). The red 
(dashed) and blue (dot-dashed) lines show the same but for sub- 
sample of red and blue galaxies, respectively. 




Fig. 20. — The rest-frame averaged star formation history of 
a sample of galaxies analysed with a one-parameter dust model 
(dashed line) and a two-parameter dust model (solid line). The 
error bars are errors on the mean. 

guished Balmer decrement should be for a given pair, 
the expected Balmer decrement as a function of optical 
depth can be easily computed for the one-parameter 
dust model. We assumed Ha/Hp = 2.87 in the absence 
of dust 
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Fig. 19. — The recovered spectrum of nine randomly selected 
galaxies, extrapolated into the UV. The black line is the recovered 
spectrum obtained with a two-parameter dust mode, and the over- 
plotted green line is the recovered spectrum obtained with a one- 
parameter dust model. The blue points are GALEX measured 
fluxes. For galaxies with very little or no recent star formation 
the two dust-models are in agreement, but differences arc evident 
when young stars are present in the galaxy. 

What is visibly affected by the choice of dust modeling 
however, is the amount of mass recovered in the first 
two bins, as expected. Figure [20] shows the average 
rest-frame star-formation history of a sub-sample of 
MGS galaxies analysed with both a 1-parameter and a 
2-parameter dust model, and the BC03 models. 

In section 15771 we compare VESPA UV predictions with 
GALEX data to address the question of the need for the 
two-parameter dust model. 

6.6.1. Balmer decrements 

A direct comparison can be made with Balmer 
decrements between Ha and Hp lines. Given that dust 
extinction is wavelength dependent, one can estimate 
the overall amount of extinction by studying the ratio 
of pairs of Balmer lines. If we know what the unextin- 



Using th e DR7 release o f the MPA added value 
catal ogue (jKauffmann et all l2003t iBrinchmann et al\ 
[200l . we measured the Balmer decrement of a sample 
of galaxies, and compared it to the expected Balmer 
decrement given the recovered value of Ty^^^ . The 
result can be seen in Figure [2TJ We recover the general 
behaviour as expected from the theory, but we measure 
values of the Balmer decrement which are systematically 
higher - although within the 1-sigma interval - given 
the dust content we recover with VESPA. We find the 
agreement encouraging. A slight adjustment of the 
expected value of the decrement in the absence of dust, 
combined with a steeper dust law (see also Section |677| 
would bring the two measurements closer. The galaxies 
for which a reliable Balmer decrement can be measured 
are typically star forming, and it is therefore not entirely 
surprising that Ty^^^ is not a complete representation 
of the dust within the galaxy. Whereas Figure [21] 
suggests that quantitatively the actual values of optical 
depth one obtains using the two methods are offset, the 
good qualitative agreement allows the user to split any 
galaxy sample according to dust content with confidence. 



6.7. Comparisons with GALEX 

There are over 300,0 00 objects in the MG S which have 
GALEX magnitudes ([Martin et al.l[2005h . The SDSS 
optical range stops redwards of the GALEX two filters, 
and it is interesting to see how the predicted GALEX 
magnitudes from the recovered spectral fits compare to 
the observed ones. 

As remarked earlier. Figure [19] shows how the recov- 
ered UV spectrum compares to the GALEX measured 
fluxes for nine random galaxies. Although not represen- 
tative, it serves as a good visual indication that the UV 
flux might be a way to distinguish between the one- and 
the two-parameter dust model. For a more quantitative 
analysis, we computed GALEX magnitudes from our 
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Fig. 21.— The value of r^^*^ recovered by VESPA plotted 
against the measured Balmer decrement {Ha/Hfj) for a sample 
of galaxies analysed with a one-parameter dust model. The red 
stars are the mean value in each bin of Ty^'^' , and the error bars 
are errors on that mean. The blue line is the expected Balmer 
decrement for a one-parameter dust model as a function of Ty^^^ . 
The kink in the line at r^'^*^ = 1 is a result from where we change 
from a mixed-slab to a uniform slab model (see Section l2.2.2l l. 

recovered spectra and the filters transmission curves and 
compared them to the published GALEX magnitudes. 
Figure [22] shows the histogram of the difference of the 
two, for both GALEX filters and in the case of the 
two dust models. We note three things about these 
histograms: there is only a very small difference between 
the one- and two- parameter dust models; there is an 
offset of the peak of the two distributions with respect 
to zero; and the scatter in the far-UV filter is larger 
than in the near-UV filter. Let us now comment on each 
of these points in turn. 

The similarity of the two panels in Figure [22] suggests 
that perhaps on average both models do just as well. 
We take this one step further by repeating the analysis 
on red and blue galaxies separately, for the near-UV 
filter - the resulting histograms are shown in Figure [23] 
We find once again that the distributions from the two 
dust models are very similar, but there are differences 
associated with the color of the galaxy. Red galaxies 
have a larger scatter, but are not offset with respect 
to zero. We can also distinguish a slight peak in the 
histogram of red galaxies, where the VESPA magnitudes 
are fainter than the measured GALEX magnitudes by 
around one magnitude. This corresponds to cases where 
VESPA predicts no recent star formation, but GALEX 
tells us it is there. Visually, it corresponds to the case in 
the central panel of Figure 1191 This happens in a small 
number of galaxies, but tells us that there is valuable 
information to be added to the optical SDSS spectrum 
of a galaxy. The offset to brighter VESPA magnitudes 
we saw before comes mostly from blue galaxies, which 
leads us into the next point. 

Our dust curve goes as A""'^, which is a relatively flat 
extinction curve (see e.g. Figure [3]). A steeper curve 
would mean that VESPA magnitudes would be fainter 
(larger), and shift the histograms in Figure [22] closer to 
zero. To investigate how a steeper dust curve in the UV 
might affect these histograms, we corrected the VESPA 
magnitudes to a dust curve which goes as A""'^^ assum- 



ing the same amount of dust, i.e., keeping Ty and Ty 
fixed. The resulting histogram, for the near-UV filter, 
can be seen in Figure [M] The change comes almost 
exclusively from the blue galaxies, as expected. This sug- 
gests that our choice of dust law might be too flat in this 
wavelength range. This is relevant if we want to include 
GALEX data in the fits, as we intend to do in the future. 

Finally, we investigate whether the large scatter in the 
magnitude differences can be explained by the recovered 
errors in the mass fractions. This is particularly impor- 
tant when we are talking about the UV and young stars, 
as a small change in mass translates into a large differ- 
ence in magnitudes. The error on each recovered flux 
point is 

ajp. = ^ Sj^aCaf3{m)Sj^i3 (29) 

where C(rn) is the covariance matrix for the esti- 
mated masses, corrected for recycling fractions and 
fiber aperture. From this the error on the estimated 
GALEX magnitude, am{VESPA) can be estimated. 
The distribution of these errors, for the near-UV, is 
shown in Figure (251 

The histogram is similar for the far-UV filter, but the 
errors are generally larger. This makes perfect sense, as 
we expect the that region to be more sensitive to young 
stars and is a possible explanation for the slightly larger 
scatter for this filter in Figure [22] However, it is obvious 
that (Tm{VESPA) alone cannot explain the scatter 
in Figures [H] and [22] In practice this is not entirely 
surprising, given that the dust law alone will affect the 
estimated GALEX magnitudes from VESPA and this 
is a source of error not accounted for at the moment. 
Another important source of scatter is undoubtedly the 
fiber correction, which we calculate using the z— band. 
We are effectively extrapolating for the total flux from 
young stars using a correction based on old populations, 
and we expect some scatter between the two. 

We conclude that on average, the star formation his- 
tories recovered by VESPA are accurate enough, even at 
young ages, to predict UV magnitudes that are far be- 
yond the fitted spectral regime. However, the spectral 
range offered by the SDSS is simply not enough to break 
some degeneracies, especially concerning young star for- 
mation and dust. From a technical point of view, the 
extra information in GALEX photometry can be easily 
incorporated in the VESPA analysis, as can any other 
photometry at the other side of the spectral range. 

7. CONCLUSIONS 

We presented a catalogue of star formation and metal- 
licity histories, dust content and stellar masses for nearly 
800,000 galaxies in SDSS' MGS and LRG samples, which 
we are now making public. The catalogue is the result of 
applying VESPA to SDSS' latest and final data release. 
VESPA has a self-regularization mechanism which gives 
an estimate of how many parameters one should recover 
from a given galaxy, given the quality of the data, and 
puts the emphasis on the robustness of the solutions, 
rather than highly-resolved star formations histories. 
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Fig. 22. — The distribution of the difference between the VESPA estimated GALEX magnitudes, and the observed magnitudes. In each 
panel the sohd hne refers to the near-UV filter, and the dashed line to the far-UV. 
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Fig. 23. — The distribution of the difference between the VESPA estimated GALEX magnitudes , an d the observed magnitudes for the 
near-UV filter. In each panel the center black histogram is for all galaxies (the same as in Figure [22t . the red histogram (shifted to the 
right) is for red galaxies and the blue histogram (shifted to the left) is for blue galaxies. 
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Fig. 24. — The distribution of the difference between the VESPA 
estimated GALEX magnitudes, and the observed magnitudes for 
the near-UV filter. The solid line assumes a A"'''^ dust law ((the 
same as in Figure [22t . and the dashed line assumes a steeper A"'''^® 
dust law but the same values of ry (see text for details). 



We find that the number of populations recovered 
from the MGS and the LRG sample is very different 
- whereas LRGs seem to only call for one to three 
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Fig. 25. — The error on recovered GALEX near-UV magnitudes, 
taking into account photon noise on the SDSS spectra only. 



populations given the current data quality, galaxies in 
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the MGS justify a more complex model, with VESPA 
typically recovering between two and five populations. 

We explored some basic properties of the catalogue, 
and showed that the derived quantities make physical 
sense. Looking at average star formation histories 
and using a simple color cut to separate red and blue 
galaxies, we saw that red galaxies are older and have 
less dust than their blue counterparts. By averaging the 
mass formed in the last 115 Gyrs in each galaxy, we 
computed a star format ion rate which is i n agre ement 
with those calculated bv iBrinchmann et al.l (|2004( l. 
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adding photometry to the VESPA fits. 

The catalogue is released with a variety of models. 
As we have stressed throughout the paper, our goal 
is not to immediately distinguish between different 
models but rather to reaffirm the need to take care 
when using the catalogue to make conclusions about the 
Universe: our answers are often, although not always, 
model dependent. Giving the user a tool to know when 
and how much to worry is a necessary step in the 
right direction. As new models become available to the 
community, we will increment the catalogue accordingly. 



The goal of this paper is not to determine which 
set of theoretical models best describe the Universe. 
However, we note that when looking at rest-frame 
averaged star-formation histories, the combination of 
M05 synthesis models with a one-parameter dust model 
seem to give the smoothest star formation history. 
This is an indication that the mass recovered in the 
first 8 bins of VESPA is particularly sensitive to the 
choice of dust modelling (for the first four) and the 
choice of SSP modelling (for the next four). The effects 
on global properties, such as total present-day stellar 
mass is small, as is the effect on recovered old stellar 
populations, but we recommend care when the user 
needs resolved histories in these time-scales. 

We also compared how the recovered dust values 
compare to observed Balmer decrements, and found a 
re-assuring qualitative agreement which allows us to 
reliably separate galaxies with different degrees of dust 
extinction. By extrapolating the recovered spectrum 
into the UV, we estimated GALEX magnitudes for a 
selection of galaxies which have been observed with 
GALEX. In blue galaxies, with recent star formation, 
we found a systematic offset between the estimated 
and observed magnitudes, with VESPA predicting 
systematically brighter magnitudes. We found that a 
steeper dust law in the UV, proportional to A""'^^, 
would explain this offset. We showed that VESPA is 
good at predicting lack of star formation given the SDSS 
range, but there is a small population of galaxies for 
which GALEX sees star formation when we do not. 

In a future publication we intend to harvest the extra 
information in GALEX and other surveys that extend 
the wavelength range of the SDSS spectra by explicitly 
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